29 research outputs found
Semi-Supervised Generation with Cluster-aware Generative Models
Deep generative models trained with large amounts of unlabelled data have
proven to be powerful within the domain of unsupervised learning. Many real
life data sets contain a small amount of labelled data points, that are
typically disregarded when training generative models. We propose the
Cluster-aware Generative Model, that uses unlabelled information to infer a
latent representation that models the natural clustering of the data, and
additional labelled data points to refine this clustering. The generative
performances of the model significantly improve when labelled information is
exploited, obtaining a log-likelihood of -79.38 nats on permutation invariant
MNIST, while also achieving competitive semi-supervised classification
accuracies. The model can also be trained fully unsupervised, and still improve
the log-likelihood performance with respect to related methods
Auxiliary Deep Generative Models
Deep generative models parameterized by neural networks have recently
achieved state-of-the-art performance in unsupervised and semi-supervised
learning. We extend deep generative models with auxiliary variables which
improves the variational approximation. The auxiliary variables leave the
generative model unchanged but make the variational distribution more
expressive. Inspired by the structure of the auxiliary variable we also propose
a model with two stochastic layers and skip connections. Our findings suggest
that more expressive and properly specified deep generative models converge
faster with better results. We show state-of-the-art performance within
semi-supervised learning on MNIST, SVHN and NORB datasets.Comment: Proceedings of the 33rd International Conference on Machine Learning,
New York, NY, USA, 2016, JMLR: Workshop and Conference Proceedings volume 48,
Proceedings of the 33rd International Conference on Machine Learning, New
York, NY, USA, 201
Recurrent Spatial Transformer Networks
We integrate the recently proposed spatial transformer network (SPN)
[Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an
RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST
sequences. The proposed model achieves a single digit error of 1.5% compared to
2.9% for a convolutional networks and 2.0% for convolutional networks with SPN
layers. The SPN outputs a zoomed, rotated and skewed version of the input
image. We investigate different down-sampling factors (ratio of pixel in input
and output) for the SPN and show that the RNN-SPN model is able to down-sample
the input images without deteriorating performance. The down-sampling in
RNN-SPN can be thought of as adaptive down-sampling that minimizes the
information loss in the regions of interest. We attribute the superior
performance of the RNN-SPN to the fact that it can attend to a sequence of
regions of interest
Utilizing Domain Knowledge in End-to-End Audio Processing
End-to-end neural network based approaches to audio modelling are generally
outperformed by models trained on high-level data representations. In this
paper we present preliminary work that shows the feasibility of training the
first layers of a deep convolutional neural network (CNN) model to learn the
commonly-used log-scaled mel-spectrogram transformation. Secondly, we
demonstrate that upon initializing the first layers of an end-to-end CNN
classifier with the learned transformation, convergence and performance on the
ESC-50 environmental sound classification dataset are similar to a CNN-based
model trained on the highly pre-processed log-scaled mel-spectrogram features.Comment: Accepted at the ML4Audio workshop at the NIPS 201
BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modeling
With the introduction of the variational autoencoder (VAE), probabilistic
latent variable models have received renewed attention as powerful generative
models. However, their performance in terms of test likelihood and quality of
generated samples has been surpassed by autoregressive models without
stochastic units. Furthermore, flow-based models have recently been shown to be
an attractive alternative that scales well to high-dimensional data. In this
paper we close the performance gap by constructing VAE models that can
effectively utilize a deep hierarchy of stochastic variables and model complex
covariance structures. We introduce the Bidirectional-Inference Variational
Autoencoder (BIVA), characterized by a skip-connected generative model and an
inference network formed by a bidirectional stochastic inference path. We show
that BIVA reaches state-of-the-art test likelihoods, generates sharp and
coherent natural images, and uses the hierarchy of latent variables to capture
different aspects of the data distribution. We observe that BIVA, in contrast
to recent results, can be used for anomaly detection. We attribute this to the
hierarchy of latent variables which is able to extract high-level semantic
features. Finally, we extend BIVA to semi-supervised classification tasks and
show that it performs comparably to state-of-the-art results by generative
adversarial networks
Hierarchical VAEs Know What They Don't Know
Deep generative models have been demonstrated as state-of-the-art density
estimators. Yet, recent work has found that they often assign a higher
likelihood to data from outside the training distribution. This seemingly
paradoxical behavior has caused concerns over the quality of the attained
density estimates. In the context of hierarchical variational autoencoders, we
provide evidence to explain this behavior by out-of-distribution data having
in-distribution low-level features. We argue that this is both expected and
desirable behavior. With this insight in hand, we develop a fast, scalable and
fully unsupervised likelihood-ratio score for OOD detection that requires data
to be in-distribution across all feature-levels. We benchmark the method on a
vast set of data and model combinations and achieve state-of-the-art results on
out-of-distribution detection.Comment: Appeared in Proceedings of the 38th International Conference on
Machine Learning (ICML 2021). 18 pages, source code available at
https://github.com/JakobHavtorn/hvae-oodd,
https://github.com/vlievin/biva-pytorch and
https://github.com/larsmaaloee/BIV